Introduction

WHAP measures linear dimensions of seed heads in the field and attempts to estimate mass of seed heads produced. Dimensions-to-mass (d2m) models are developed and used to transform linear dimensions to estimated mass of seed heads. Then, number of seed heads per unit area and average seed head dimensions are used to estimate mass of seed heads per unit area in each of the abundance strata (low, medium and high) for each species.

This section of the document addresses three aspects of using seed head dimensions and models to estimate seed head mass:

A. Model development. B. Assessment of model performance. C. Need and strategy to update models.

Model development consists of finding best models to produce mass estimates. For example, in this stage we determine the form of the functional relationship between dimensions and mass, and whether transformations are necessary.

Once the best models are identified, we determine how good they are using generic measures such as root mean squared error and man absolute deviation. Models are assessed by k-fold cross validation and validation on subsets of data not used to fit the model.

In the last stage (C) we ask: Is it necessary to create a different d2m model for each location and year or is it sufficient to create one model that is applicable over refuges and years? Of course, this question can only be answered for past years and observed locations. One can assume that differences (or lack of differences) persist into the future and decide this once and for all or one can continue to test over time. This document is about exploring the answer to the question using data about seed head dimensions and mass collected in several refuges in 2019 and in Modoc in 2021.

Finally, based on the results of the analyses, we propose a feasible strategy to update models in the future.

We should keep in mind that the uncertainty due to these models is NOT incorporated in the final mass estimates in the current protocol. Thus, the effect of choice of approach on precision is mostly invisible.

Models should also be compared in terms of the impact on estimated mass of seed heads in sub units and refuges. Statistically significant differences do not necessarily translate into differences of practical significance, particularly when there are other sources of larger variance in the estimates, and considering that it is more time costly to create new models for each location and year.

Data

We have data for Swamp Timothy in six, and for Watergrass in four, different combinations of locations and years. Smartweed was sampled only in Modoc in 2021. No refuge was sampled more than once to create d2m models. Thus, the variation among groups is mostly spatial, and we cannot assess the temporal variation. There are no data to estimate the possible variation within a refuge among years.

The data are in R files st_all_noO.rds, wgdat.rds and sh_dim_MDC2021.rds. The first two are based on data collected in 2019, whereas the last one is based on data collected in Modoc in 2021.

##  [1] "CLS"        "KRN"        "PXL"        "SAC"        "STL"       
##  [6] "PIX"        "SL_"        "Colusa"     "Sacramento" "StoneLakes"
## [11] "MDC"

Model development

This section is organized by species. Data for each species are explored numerically and graphically to get a first idea of what models might work and what transformation may be necessary. Then, several models are tried and compared using statistics such as MSE, Rsq and AIC. A final model is selected to be used and evaluated in cross-validation and external validation.

Models for Swamp Timothy

Data exploration

There are 136 observations in the Swamp Timothy d2m data set.

29, 21, 18, 18, 31, 19

The scatter plot has both axis in logarithmic scale, which accounts for the increasing variance as seed heads get longer. Data and regression lines for all groups cluster closely, with the exception of SAC_-_2019. There is some indication that SAC had heavier Swamp Timothy seed heads than the rest of the groups for any given seed head length.

Random-effect approach Swamp Timothy

Each group of observations within a refuge and year can be considered a random draw of all possible groups. This creates a basis to estimate seed head mass for unobserved groups. In the case of the observed groups we make predictions that include the random effect of the group and exclude the uncertainty due to group. In the case of unobserved new groups, predictions are made at the population level and uncertainty includes a variance component due to deviations of groups about the population (Gelman and Hill, 2007).

In this approach, the question of whether a new refuge or year need to be sampled can be addressed by exploring the magnitude of group-level variance in slope and intercept. A lot of variance in either or both of the estimated parameters indicates that groups can differ greatly from each other, and thus, a new group would have a large probability of being very different from the average of groups observed. Large variance suggests sampling all new groups to avoid the uncertainty among groups. Small variance indicates that marginal predictions might be sufficient. Variance magnitude is evaluated using the second-order coefficient of variation (Kvalseth 2017).

Variance increases with fitted value in all data sets and there is indication of slight non-linearity. We try a Box-Cox transformation that is:

\[ Y' = \frac{Y^\lambda - 1}{\lambda} \qquad \quad \text{for lambda not 0, and} \\ \quad\\ Y' = log(Y) \qquad \quad \text{for lambda = 0}\]

Predictions resulting from the models where the response received a Box-Cox transformation have to be back transformed into mass in mg by the inverse transformation. When \(\lambda = 0\) the transformation is the log of the response and it is inverted with the exponential. Otherwise the back transformation is:

\[ Y = (\lambda \enspace Y' + 1)^ {\enspace (1/\lambda)}\]

which in R code is (Y = lambda * Y_prime + 1) ^ (1/lambda), where \(Y\) is mass in mg and \(Y'\) is the transformed value provided by using the model.

# Define function to invert the Box-Cox transformation
invBoxCox <- function(x, lambda) {
  if (lambda == 0) exp(x) else (lambda * x + 1) ^ (1 / lambda)}
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## bcPower(mass_mg, lambda = st_lamb1) ~ log_length_mm + (log_length_mm |  
##     LIT_yr)
##    Data: st_d2m
## 
## REML criterion at convergence: 224.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.3799 -0.6041  0.0620  0.7055  1.6744 
## 
## Random effects:
##  Groups   Name          Variance Std.Dev. Corr
##  LIT_yr   (Intercept)   0.000145 0.01204      
##           log_length_mm 0.006786 0.08238  0.90
##  Residual               0.277267 0.52656      
## Number of obs: 136, groups:  LIT_yr, 6
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)    -1.6357     0.2867  -5.705
## log_length_mm   1.9064     0.1205  15.817
## 
## Correlation of Fixed Effects:
##             (Intr)
## lg_lngth_mm -0.943
Random effects of group on the slope and intercept of the relationship between log mass and log length of Swamp Timothy seed heads.
Group Intercept Slope
CLS_2019 -0.001 -0.008
KRN_2019 -0.009 -0.070
MDC_2021 0.006 0.044
PIX_2019 -0.008 -0.059
SAC_2019 0.016 0.123
SLW_2019 -0.004 -0.030

The linear log-log model does not show need for additional terms. Fitted and observed values are linearly related and centered about the 1:1 line. The magnitude of random effects shown in the table give an idea of the potential bias introduced by making marginal predictions. The size of this component of the total error (bias plus variance) can also be assessed in relation to the residual variance in the model summary above. These effects are visualized more easily in the section about evaluation of prediction alternatives presented below.

A numerical assessment of precision of estimates is performed by calculating bootstrap confidence intervals for prediction of mass of seed heads with average dimensions for each group. Coefficients of variation and margin of errors are calculated for each group and compared to desirable values.

Estimated seed head mass (mg), coefficient of variation (cv2), margin of error (MOE) and 90% CI for swamp timothy seedheads of average dimensions.
LIT_yr mass_median lwr mass_mean upr se cv2 MOE n
CLS_2019 32.8 28.4 32.8 37.7 2.8 0.08 0.14 29
KRN_2019 18.5 15.5 18.6 21.6 1.8 0.10 0.16 21
MDC_2021 43.1 36.2 43.3 51.6 4.8 0.11 0.18 18
PIX_2019 30.3 24.8 30.2 35.4 3.3 0.11 0.18 18
SAC_2019 22.4 19.2 22.6 26.6 2.2 0.10 0.16 31
SLW_2019 13.2 11.1 13.2 15.3 1.3 0.10 0.16 19

The coefficients of variation are very small, indicating high precision in the estimates. Second-order coefficients of variation smaller than 0.20 (20%) are very small (Kvålset, 2017). The margin of error achieved at 90% confidence varies between 0.14 and 0.176 of the average, which may be be a bit too high according to some standards (FIND REFERENCES) but I think it is quite good for biological and field sciences.

Models for Watergrass

Data Exploration

There are 1142 observations in the Watergrass d2m data set.

Scatter plots show differences among groups in the responses to predictors.

Random-effect approach Watergrass

In the case of Watergrass there are more steps in the model creation because we have more predictors. A first model with as many random effects as the data can support is slightly reduced to reach a final model with two predictors.

wg_frml1 <- mass_mg ~
  log_length_mm +
  log_emerged +
  log_length_mm:log_emerged +
  (1 | LIT_yr) +
  (0 + log_length_mm | LIT_yr) +
  (0 + log_emerged | LIT_yr) + 
  (0 + log_length_mm:log_emerged | LIT_yr)

wg_lamb = powerTransform(
  lmer(wg_frml1,
       data = wg_d2m)) %>%
  coef(round = TRUE) %>%
  unname()

# wg_d2m %<>%
#   mutate(bc_mass = bcPower(mass_mg, lambda = wg_lamb))
# 
wg_d2m_m2 <- lmer(update(wg_frml1, bcPower(mass_mg, lambda = wg_lamb) ~ .),
                  data = wg_d2m)

summary(wg_d2m_m2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged +  
##     (1 | LIT_yr) + (0 + log_length_mm | LIT_yr) + (0 + log_emerged |  
##     LIT_yr) + (0 + log_length_mm:log_emerged | LIT_yr) + log_length_mm:log_emerged
##    Data: wg_d2m
## 
## REML criterion at convergence: 3227.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4672 -0.5304  0.0547  0.6374  5.5641 
## 
## Random effects:
##  Groups   Name                      Variance Std.Dev.
##  LIT_yr   (Intercept)               0.118208 0.34381 
##  LIT_yr.1 log_length_mm             0.005771 0.07597 
##  LIT_yr.2 log_emerged               4.113371 2.02814 
##  LIT_yr.3 log_length_mm:log_emerged 0.403203 0.63498 
##  Residual                           0.956015 0.97776 
## Number of obs: 1142, groups:  LIT_yr, 4
## 
## Fixed effects:
##                            Estimate Std. Error t value
## (Intercept)               -9.269398   0.544570 -17.022
## log_length_mm              3.454726   0.117542  29.391
## log_emerged                2.166046   1.486860   1.457
## log_length_mm:log_emerged  0.008455   0.390819   0.022
## 
## Correlation of Fixed Effects:
##             (Intr) lg_ln_ lg_mrg
## lg_lngth_mm -0.886              
## log_emerged -0.293  0.282       
## lg_lngth_:_  0.237 -0.242 -0.419
wg_d2m_m3 <- update(wg_d2m_m2, . ~ . - (0 + log_length_mm:log_emerged | LIT_yr))

# wg_d2m_m2 is better than wg_d2m_m3. Try to remove another random effect.
anova(wg_d2m_m2, wg_d2m_m3)
## Data: wg_d2m
## Models:
## wg_d2m_m3: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged + (1 | LIT_yr) + (0 + log_length_mm | LIT_yr) + (0 + log_emerged | LIT_yr) + log_length_mm:log_emerged
## wg_d2m_m2: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged + (1 | LIT_yr) + (0 + log_length_mm | LIT_yr) + (0 + log_emerged | LIT_yr) + (0 + log_length_mm:log_emerged | LIT_yr) + log_length_mm:log_emerged
##           npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)  
## wg_d2m_m3    8 3245.9 3286.2  -1615   3229.9                       
## wg_d2m_m2    9 3243.9 3289.3  -1613   3225.9 3.9725  1    0.04625 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
wg_d2m_m4 <- update(wg_d2m_m2, . ~ . -(0 + log_length_mm | LIT_yr))

anova(wg_d2m_m2, wg_d2m_m4) # wg_d2m_m4 is better
## Data: wg_d2m
## Models:
## wg_d2m_m4: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged + (1 | LIT_yr) + (0 + log_emerged | LIT_yr) + (0 + log_length_mm:log_emerged | LIT_yr) + log_length_mm:log_emerged
## wg_d2m_m2: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged + (1 | LIT_yr) + (0 + log_length_mm | LIT_yr) + (0 + log_emerged | LIT_yr) + (0 + log_length_mm:log_emerged | LIT_yr) + log_length_mm:log_emerged
##           npar    AIC    BIC logLik deviance  Chisq Df Pr(>Chisq)
## wg_d2m_m4    8 3242.1 3282.4  -1613   3226.1                     
## wg_d2m_m2    9 3243.9 3289.3  -1613   3225.9 0.1273  1     0.7212
summary(wg_d2m_m4)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged +  
##     (1 | LIT_yr) + (0 + log_emerged | LIT_yr) + (0 + log_length_mm:log_emerged |  
##     LIT_yr) + log_length_mm:log_emerged
##    Data: wg_d2m
## 
## REML criterion at convergence: 3228.3
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4745 -0.5335  0.0536  0.6317  5.5586 
## 
## Random effects:
##  Groups   Name                      Variance Std.Dev.
##  LIT_yr   (Intercept)               0.1524   0.3904  
##  LIT_yr.1 log_emerged               4.1470   2.0364  
##  LIT_yr.2 log_length_mm:log_emerged 0.4435   0.6659  
##  Residual                           0.9573   0.9784  
## Number of obs: 1142, groups:  LIT_yr, 4
## 
## Fixed effects:
##                           Estimate Std. Error t value
## (Intercept)                -9.1549     0.5453 -16.790
## log_length_mm               3.4237     0.1086  31.520
## log_emerged                 2.1562     1.4962   1.441
## log_length_mm:log_emerged   0.0252     0.4045   0.062
## 
## Correlation of Fixed Effects:
##             (Intr) lg_ln_ lg_mrg
## lg_lngth_mm -0.922              
## log_emerged -0.294  0.308       
## lg_lngth_:_  0.227 -0.251 -0.410
# Re fit wg_d2m_m4 with a new lambda

wg_lamb = powerTransform(
  lmer(update(formula(wg_d2m_m4), mass_mg ~ .),
       data = wg_d2m)) %>%
  coef(round = TRUE) %>%
  unname()

wg_d2m_m4 <- lmer(formula(wg_d2m_m4), data = wg_d2m)

summary(wg_d2m_m4)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged +  
##     (1 | LIT_yr) + (0 + log_emerged | LIT_yr) + (0 + log_length_mm:log_emerged |  
##     LIT_yr) + log_length_mm:log_emerged
##    Data: wg_d2m
## 
## REML criterion at convergence: 3224.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4750 -0.5332  0.0538  0.6319  5.5586 
## 
## Random effects:
##  Groups   Name                      Variance Std.Dev.
##  LIT_yr   (Intercept)               0.1521   0.3900  
##  LIT_yr.1 log_emerged               4.1348   2.0334  
##  LIT_yr.2 log_length_mm:log_emerged 0.4408   0.6640  
##  Residual                           0.9540   0.9767  
## Number of obs: 1142, groups:  LIT_yr, 4
## 
## Fixed effects:
##                           Estimate Std. Error t value
## (Intercept)               -9.13573    0.54438 -16.782
## log_length_mm              3.41817    0.10844  31.522
## log_emerged                2.15561    1.49378   1.443
## log_length_mm:log_emerged  0.02424    0.40346   0.060
## 
## Correlation of Fixed Effects:
##             (Intr) lg_ln_ lg_mrg
## lg_lngth_mm -0.921              
## log_emerged -0.294  0.308       
## lg_lngth_:_  0.227 -0.251 -0.410
wg_d2m_m5 <- update(wg_d2m_m4, . ~ . -(0 + log_length_mm:log_emerged | LIT_yr))

anova(wg_d2m_m5, wg_d2m_m2)
## Data: wg_d2m
## Models:
## wg_d2m_m5: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged + (1 | LIT_yr) + (0 + log_emerged | LIT_yr) + log_length_mm:log_emerged
## wg_d2m_m2: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged + (1 | LIT_yr) + (0 + log_length_mm | LIT_yr) + (0 + log_emerged | LIT_yr) + (0 + log_length_mm:log_emerged | LIT_yr) + log_length_mm:log_emerged
##           npar    AIC    BIC  logLik deviance  Chisq Df Pr(>Chisq)
## wg_d2m_m5    7 3241.3 3276.6 -1613.7   3227.3                     
## wg_d2m_m2    9 3243.9 3289.3 -1613.0   3225.9 1.3925  2     0.4984
summary(wg_d2m_m5)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bcPower(mass_mg, lambda = wg_lamb) ~ log_length_mm + log_emerged +  
##     (1 | LIT_yr) + (0 + log_emerged | LIT_yr) + log_length_mm:log_emerged
##    Data: wg_d2m
## 
## REML criterion at convergence: 3231.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.4849 -0.5291  0.0590  0.6305  5.4923 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  LIT_yr   (Intercept) 0.1794   0.4235  
##  LIT_yr.1 log_emerged 6.3943   2.5287  
##  Residual             0.9632   0.9814  
## Number of obs: 1142, groups:  LIT_yr, 4
## 
## Fixed effects:
##                           Estimate Std. Error t value
## (Intercept)                -9.2352     0.5527 -16.708
## log_length_mm               3.4425     0.1088  31.635
## log_emerged                 3.1706     1.4594   2.173
## log_length_mm:log_emerged  -0.1998     0.1520  -1.314
## 
## Correlation of Fixed Effects:
##             (Intr) lg_ln_ lg_mrg
## lg_lngth_mm -0.911              
## log_emerged -0.301  0.313       
## lg_lngth_:_  0.601 -0.661 -0.482
print("Pseudo R-squared for model wg_d2m_m4")
## [1] "Pseudo R-squared for model wg_d2m_m4"
MuMIn::r.squaredGLMM(wg_d2m_m4)
##            R2m       R2c
## [1,] 0.3218151 0.8893871
print("Pseudo R-squared for model wg_d2m_m5")
## [1] "Pseudo R-squared for model wg_d2m_m5"
MuMIn::r.squaredGLMM(wg_d2m_m5)
##            R2m       R2c
## [1,] 0.4377277 0.8449463

Two models (wg_d2m_m4 and wg_d2m_m5) are good candidates for predictions. First we study the general precision of the model that is slightly better, wg_d2m_m4. The histograms of bootstrapped predictions clearly show the effect of sample size on the precision of estimates.

Estimated seed head mass (mg), coefficient of variation (cv2), margin of error (MOE) and 90% CI for watergrass seedheads of average dimensions. Based on bootMer
LIT_yr mass_median lwr mass_mean upr se cv2 MOE n
CLS_2019 135 127 135 143 5.0 0.04 0.06 396
MDC_2021 162 131 168 225 29.4 0.17 0.26 42
SAC_2019 101 97 101 105 2.5 0.02 0.04 644
SLW_2019 239 210 241 276 19.8 0.08 0.13 60

Precision is very high for SAC_2019 and CLS_2019, high for SLW_2019 and marginal for MDC_2021. Relative to the means, margins of error (half CI width as a proportion of the mean) are between 0.04 and 0.264. Based on the fact that the results for CLS_2019 are similar to achievable and desirable standards, we recommend a sample size of 300-400 seed heads to develop new d2m models for Watergrass.

Overall the model wg_d2m_m4 has a conditional pseudo R-sq (Nakagawa and Schielzeth 2013, Bartoń 2020) of 0.8893871 and a marginal pseudo R-sq of 0.3218151. The corresponding values for wg_d2m_m5 are above and show that although it has a slightly smaller conditional R-sq, its marginal R-sq is better. This means that it emphasizes more the fixed effects and might be a more conservative model with better performance for unobserved groups.

It is likely that a lot of variation in seed head mass for a given set of dimensions is due to differences in the phenological stage of seed heads. Seed heads that are immature will tend to be lighter than expected because the have not developed or completely filled the grains, whereas mature seed heads will tend to be heavier. Because d2m models excluded seed heads that have shattered seeds, mass of seed produced by seed heads that are in late phenological stages is not underestimated. Even when seeds have dropped, linear dimensions of the seed head stem and rachis remain indicative of the mass of the complete seed head. Therefore, acceptable estimates of mass of seed heads that have lost lots of seeds can be made using their linear dimensions, because the models are based on almost intact seed heads.

Models for Smartweed

Data Exploration

There are 448 observations in the Smartweed d2m data set.

Some models (Sherfy et al ., 1999; Osborn et al., 2017; Collins et al., 2017) use an a priori combination of dimensions, such as an approximate volume, to calculate volume and then regress mass on volume. We develop a model based on volume because the data include length and width of seed heads. The volume is approximated with the formula for a cylinder. Because width of Smartweed seed heads was not measured in all refuges, we also develop a model that uses seed head length as a single predictor. Scatter plots mass and all predictors are used to explore the multivariate relationships. The top row of of graphs shows that there is one outlier with extremely high mass for a very short seed head (mass vs. length) and one outlier with extreme volume and low mass (mass vs. volume). It is also evident that variance increases with increasing dimensions, and length has high predictive value.

##     LIT year       spp length_mm width_mm mass_mg   LIT_yr log_mass_mg
## 212 MDC 2021 Smartweed        29       18  116.16 MDC_2021    4.754969
##     log_length_mm  vol_mm3
## 212      3.367296 7379.601

## [1] 381

An outlier with unusual width and one with extreme volume were removed.

Model

Data for Smartweed comes only from Modoc in 2021. Therefore, there are no groups and the model includes only fixed effects. The model development process consists in finding the appropriate functional form to use.

Volume as single predictor

First we try a simple linear model with volume.

## 
## Call:
## lm(formula = mass_mg ~ vol_mm3, data = sw_d2m)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -183.20  -20.65  -10.64   15.95  215.37 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 14.636579   3.122063   4.688 3.67e-06 ***
## vol_mm3      0.105712   0.004403  24.007  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 42.29 on 444 degrees of freedom
## Multiple R-squared:  0.5649, Adjusted R-squared:  0.5639 
## F-statistic: 576.4 on 1 and 444 DF,  p-value: < 2.2e-16

Variance increases with fitted value and there is indication of slight non-linearity. We try a Box-Cox transformation on the response and log transformation of the predictor.

# Get optimal lambda
sw_lamb1 <- powerTransform(lm(mass_mg ~ vol_mm3,
                              data = sw_d2m)) %>%
  coef(round = TRUE) %>%
  unname()

# Fit model with transformed Y
sw_m2 <- lm(bcPower(mass_mg, lambda = sw_lamb1) ~ vol_mm3,
               data = sw_d2m)

opar <- par(mfrow = c(2,2)); plot(sw_m2); par(opar)

# Some nonlinearity remains, which increases the variance at the right side
# Try an additional transformation in volume

sw_lamb2 <- powerTransform(lm(mass_mg ~ log(vol_mm3),
                              data = sw_d2m)) %>%
  coef(round = TRUE) %>%
  unname()

sw_m3 <- lm(bcPower(mass_mg, lambda = sw_lamb2) ~ log(vol_mm3),
               data = sw_d2m)

opar <- par(mfrow = c(2,2)); plot(sw_m3); par(opar)

summary(sw_m3)
## 
## Call:
## lm(formula = bcPower(mass_mg, lambda = sw_lamb2) ~ log(vol_mm3), 
##     data = sw_d2m)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.50635 -0.58610 -0.06237  0.58599  2.71739 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -4.72402    0.27447  -17.21   <2e-16 ***
## log(vol_mm3)  1.65073    0.04566   36.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8832 on 444 degrees of freedom
## Multiple R-squared:  0.7465, Adjusted R-squared:  0.7459 
## F-statistic:  1307 on 1 and 444 DF,  p-value: < 2.2e-16

The residuals of the second model, sw_m2, exhibit a clear concave-down shape that was addressed by log-transforming the predictor (vol) in sw_m3. The last model meets assumptions and is selected. This model has an adjusted coefficient of determination of 0.746 and residual standard deviation equal to 0.8832.

Precision of estimates is explored with confidence intervals and histograms for predictions made at the average value of predictors. In this case we use tidymodels to do nonparametric bootstrapping because there are no random effects, so bootMer is not applicable.

Estimated seed head mass (mg), coefficient of variation (cv2), margin of error (MOE) and 90% CI for Smartweed seedheads of average dimensions. Model uses log(vol_mm3) as single predictor.
LIT_yr mass_median lwr mass_mean upr se cv2 MOE n
MDC_2021 69 66 69 72 1.7 0.02 0.04 446

Multiple predictors

Instead of using only a preset function of dimensions (volume) as predictor, this section explores the possibility of using all dimensions as predictors.

## 
## Call:
## lm(formula = bcPower(mass_mg, lambda = sw_lamb3) ~ log(vol_mm3) + 
##     width_mm + length_mm, data = sw_d2m)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.54318 -0.68252 -0.08755  0.63952  2.85970 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.58706    0.61661  -9.061  < 2e-16 ***
## log(vol_mm3)  2.10084    0.21581   9.735  < 2e-16 ***
## width_mm     -0.26845    0.08686  -3.091  0.00212 ** 
## length_mm     0.01660    0.01403   1.183  0.23741    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9537 on 442 degrees of freedom
## Multiple R-squared:  0.7646, Adjusted R-squared:  0.763 
## F-statistic: 478.5 on 3 and 442 DF,  p-value: < 2.2e-16
## Analysis of Variance Table
## 
## Response: bcPower(mass_mg, lambda = sw_lamb3)
##               Df  Sum Sq Mean Sq   F value    Pr(>F)    
## log(vol_mm3)   1 1268.77 1268.77 1394.8608 < 2.2e-16 ***
## width_mm       1   35.77   35.77   39.3231 8.557e-10 ***
## length_mm      1    1.27    1.27    1.3997    0.2374    
## Residuals    442  402.04    0.91                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model sw_m5 has R-sq equal to 0.763 and residual standard deviation equal to 0.9537. These values cannot be directly compared to the corresponding ones for other models because the transformations -and therefore the units- are different. Models have to be compared on the basis of back-transformed bootstrapping results.

Estimated seed head mass (mg), coefficient of variation (cv2), margin of error (MOE) and 90% CI for Smartweed seedheads of average dimensions. Model uses log(vol_mm3), length_mm and width_mm as predictors.
LIT_yr mass_median lwr mass_mean upr se cv2 MOE n
MDC_2021 73 68 73 78 3.1 0.04 0.07 446

Finally, we proceed to use length as a single predictor.

Length as a single predictor

Width was not measured in any refuge other than MDC in 2021. In order to calculate mass of Smartweed for the other refuges we need to make a model with length as a single predictor. Keep in mind that the only training data available for d2m is from MDC.

## 
## Call:
## lm(formula = bcPower(mass_mg, lambda = sw_lamb4) ~ length_mm, 
##     data = sw_d2m)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.6908 -2.3454 -0.3695  2.0940 10.8472 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.0661     0.3041   10.08   <2e-16 ***
## length_mm     0.4848     0.0177   27.39   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.097 on 444 degrees of freedom
## Multiple R-squared:  0.6282, Adjusted R-squared:  0.6274 
## F-statistic: 750.3 on 1 and 444 DF,  p-value: < 2.2e-16
## Analysis of Variance Table
## 
## Response: bcPower(mass_mg, lambda = sw_lamb4)
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## length_mm   1 7198.1  7198.1  750.34 < 2.2e-16 ***
## Residuals 444 4259.3     9.6                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model sw_m6 is evaluated using tidy bootstrapping. This model can be used for groups where only length of Smartweed seed heads is measured. The advantage of measuring both dimensions can be assessed by comparing sw_m6 with the previous two models.

Estimated seed head mass (mg), coefficient of variation (cv2), margin of error (MOE) and 90% CI for Smartweed seedheads of average dimensions. Model uses log(length_mm) as single predictor.
LIT_yr mass_median lwr mass_mean upr se cv2 MOE n
MDC_2021 65 63 65 68 1.6 0.03 0.04 446

This last model predicts seed head mass at least as well as the model that uses volume, and better than the model with multiple predictors. This strongly suggests that it is sufficient to measure only length of Smartweed seed heads. Precision is very high with 446 observations. We suggest that the same rule as for Watergrass : 300-400 seed heads should be analyzed for d2m model development.

Model Evaluation

This section applies to Swamp Timothy and Watergrass because we have measurements for these species in more than one refuge-year combination. The evaluation focuses on determining the potential increments in precision and accuracy obtained by developing a dimensions-to-mass (d2m) model for each species in each refuge and year when the protocol is applied. Simultaneously, precision of the models for the present groups is evaluated by cross-validation, which should give more conservative estimates than those obtained above using bootstrapping.

Approach and functions

With these data we have the following alternatives:

  1. Collect d2m data for each new LIT_yr group. Augment the data base and create a model that always includes effects of all LIT_yr groups ever observed. Always make group-specific predictions (conditional predictions) whose uncertainty excludes the variance of random effects.

The performance of this option is assessed with the present data by k-fold cross-validation with folds balanced by groups. The k-fold validation is based on assessing predictions for multiple data subsets that are not included in the model parameterization, but all refuge-year groups to be predicted are represented in the data.

  1. Take all d2m data available and create a model that includes effects of LIT_yr groups. No more data about d2m are collected and this model is not updated. The LIT_yr groups present in the data receive conditional predictions as explained in 1 above. Predictions for other years and locations are made at the population level (marginal prediction) but the variance of random effects contribute to the uncertainty of estimates. Predictions for unmeasured groups will be biased by an amount proportional to how much they differ from the average of all groups included in the model. Examples are seen clearly in the case of Watergrass presented below.

The performance of this option is assessed with the present data by leaving each group out at a time, developing the model with the rest of the groups and using it to make predictions for the group left out.

Three metrics are used to assess the performance of the options:

  1. Means absolute deviation (mae) is a measure of accuracy and error.
  2. Root mean squared error (rmse) is a measure of accuracy and total error.
  3. R squared (rsq) is the square of the correlation between observed and predicted values and it is a measure of consistency.

Predictions that deviate from observed values in a linear relationship can have very large rsq (good) and also large rsme (bad). If the slope of the relationship is 1.0 the bias is constant over X (predictor space). If the slope is different from 0, the bias changes as X changes.

Function to evaluate alternatives

This function performs cross-validation and external validation. The code for this function (and for a complete .RMD file) is included in supplementary materials.

Function to summarize results of each evaluation

This function takes the list created by the previous function, summarizes and plots results.

Evaluation of prediction alternatives

Swamp Timothy graphical results

## [1] "SWAMP TIMOTHY PREDICTIONS FOR SAMPLED REFUGES"

## [1] "SWAMP TIMOTHY PREDICTIONS FOR NEW REFUGES"

The top graph for Swamp Timothy shows the observed vs. predicted values based on a model that accounts for the differences among groups. These are the conditional predictions, which do not include uncertainty due to between-group variance, given that the random effects for all groups predicted are available. Each cross-validation was repeated 10 times. Therefore, for each observed value (mass_mg) there are 10 predictions (model_pred). Predictions are precise and accurate except for MDC_2021, which exhibits large variance, particularly in the large end of values. The second longest seed head in this group is an outlier with extremely low mass equal to 20.05. CLS_2019 and MDC_2021 show negative bias, which in the case of CLS increases with mass. Overall, performance is marginal and comparable to that reported above for Swamp Timothy based on bootstrapping. Given that sample sizes are fairly small, model performance should be improved by increasing sample size.

The graphs with predictions for “new” refuges or groups show that performance are worse for MDC_2021 and CLS_2019 due to increased bias and reduced precision. SAC_2019 exhibits a little bias, but it would be acceptable.

Seed head sampling for Swamp Timothy We recommend that future models be based on samples with 50-100 observations (seed heads) per group. A good strategy would be to collect samples that span a wide range of seed head lengths. Divide the wide range in 6-10 classes and make sure to have 5-10 seed heads -collected from a variety of locations in the refuge- in each class.

Swamp Timothy metrics

Table of cross validation results for SwampTimothy. These results represent how well the model performs when all groups have data for d2m.
LIT_yr rmse rsq mae
CLS_2019 35.0 0.82 17.4
KRN_2019 14.0 0.42 9.6
MDC_2021 122.5 0.36 61.7
PIX_2019 22.9 0.62 15.4
SAC_2019 12.2 0.85 8.9
SLW_2019 3.7 0.57 2.7
Average 35.0 0.60 19.3
Table of final validation results for SwampTimothy when data for each refuge is kept out from model creation and used for final testing. This test provides a realistic estimate of the error expected when using data from an observed set of refuge-year combinations to predict seed head mass in a new refuge-year.
grp_out rmse rsq mae
CLS_2019 37.0 0.85 18.1
KRN_2019 13.2 0.51 9.9
MDC_2021 112.3 0.65 57.0
PIX_2019 19.4 0.74 15.7
SAC_2019 18.1 0.89 13.5
SLW_2019 3.8 0.60 2.9
Average 34.0 0.71 19.5

Cross-validation metrics further support the need for larger sample sizes. In these tables, the rmse is the square root of total squared deviation, which in turn is the sum of bias squared plus variance. On the other hand, rsq is an indicator of the proportion of variance that is explained, excluding bias. Groups with higher rsq and lower rmse have greater bias. In agreement with the graphical output, MDC_2021 and CLS_2019 show very low to low precision and high bias.

Watergrass graphical results

## [1] "WATERGRASS PREDICTIONS FOR SAMPLED REFUGES"

## [1] "WATERGRASS PREDICTIONS FOR NEW REFUGES"


Wategrass models have excellent precision in corss-validation due to the much large sample sizes than for Swamp Timothy. MDC_2021 exhibits a bit of negative bias and suprisingl heavy seed heads, more than 5 times heavier than the heaviest seed heads collected in other groups. This data set has to be revisited and checked.

External validation shows dramatic biases introduced when making predictions for groups not included in the training sample. Mosf of the bias is introduced by MDC_2021. Those data have to be checked and a larger set of groups has to be sampled to determine the need for group-specific d2m sampling in the future. Without this information, the present conclusion is to obtain group specific samples in the future.

Watergrass metrics

Table of cross validation results for Watergrass. These results represent how well the model performs when all groups have data for d2m.
LIT_yr rmse rsq mae
CLS_2019 100.3 0.58 68.0
MDC_2021 718.5 0.58 383.4
SAC_2019 80.4 0.67 48.7
SLW_2019 147.2 0.45 114.0
Average 261.6 0.57 153.5
Table of final validation results for Watergrass when data for each refuge is kept out from model creation and used for final testing. This test provides a realistic estimate of the error expected when using data from an observed set of refuge-year combinations to predict seed head mass in a new refuge-year.
grp_out rmse rsq mae
CLS_2019 246.2 0.38 141.7
MDC_2021 1245.6 0.70 767.1
SAC_2019 268.0 0.34 174.6
SLW_2019 168.2 0.48 127.8
Average 482.0 0.47 302.8



Discussion of evaluation results

When all groups are in the training data, predictions are accurate, but because there are groups that deviate from the the rest, not including all groups in the training data introduces bias in the predictions for the “new” groups. The potential magnitude of bias is almost completely dependent on data from MDC_2021 for both Swamp Timothy and Watergrass.

The need for group-specific d2m data for Smartweed cannot be assessed because we only have data for one group, MDC_2021. Sample size for this groups is sufficient to provide sufficiently accurate and precise estimates.

Models to estimate mass based on linear dimensions of see heads were developed by several research groups across the US, See references below.

A quick read of several papers reporting equations to assess seed mass based on plant dimensions revealed that none of their equations included transformations (except for a couple in Gray et al., (2009). I find it hard to believe that their variances about predicted values were the same, regardless of the size of seed heads and plants. It is almost a physical necessity that larger plants exhibit more variance (in absolute terms) than smaller plants. Because of the restriction to be positive, it is almost impossible for small plants or seedheads to have the same variance as larger ones. Strictly speaking, mass cannot have a normal distribution, and it is very likely to deviate more from normality as the mean becomes small (thus, the lognormal distribution, or the log transformation of mass).

Most articles seem to emphasize the coefficient fo determination as measure of model performance. We should keep in mind that prediction performance should be focused on the accuracy and precision of predictions, which can be evaluated by cross-validation and external validation.

Estimates of standing seed head mass at a given date most likely underestimate the seed production during the whole season (Hillhouse et al., 2018). Several species have an extended seed head production season, and the standing crop at any given time is a fraction of the total seasonal production. Thus, the estimates produced by the present method should be used in relative terms, and sampling dates should be standardized based on the phenology of the plant community. The sampling date could be set on the basis of seasonal thermal units (degree-days).


How well do models work for sampled refuges? They work very well for all cases where sample size is sufficient. S. Timothy estimates are acceptably precise with sample sizes greater than 30. Watergrass and Smartweed estimates are excellent for sample sizes greater than 400.


How well do models work for unsampled refuge? They work or less OK for Swamp Timothy, perhaps with the exception of MDC_2021. They work very poorly for Watergrass, mostly because of the odd results for MDC_2021.


How does this compare with the rest of the uncertainty?

Calculate effect of adding this uncertainty to the whole calculation. TBD

Strategy to update models

Based on an ad hoc assessment of feasibility and costs of getting additional d2m data we recommend that the models developed up to time of this report be used for three additional years. Conditional predictions of mass should be used for refuges for which there is d2m data. Marginal predictions should be used for new refuges.

Conditional predictions incorporate the random effects of refuge-year groups, under the assumption that random effects are associated with refuge and remain constant over years. Marginal predictions are also known as population-level predictions and they do not incorporate any random effects, because in the absence of estimated random effects, the best guess is the average across all groups. For more detail about the difference between predicting for observed vs. new groups see pages 272-274 of Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.

After three years or whenever feasible, collect data for all species in all refuges. Analyze the cumulative data with mixed models that include random effects for slopes and intercepts for refuge, year and their interaction. Evaluate the impact of making predictions that do not incorporate the random effects of year or interaction. If bias introduced by ignoring those effects or the rmse is not acceptable, refuges have to be sampled every year.

References

Tarald O. Kvålseth (2017) Coefficient of variation: the second-order alternative, Journal of Applied Statistics, 44:3, 402-415, DOI: 10.1080/02664763.2016.1174195)

Gray, M. J., Foster, M. A., & Peña Peniche, L. A. (2009). New Technology for Estimating Seed Production of Moist-Soil Plants. Journal of Wildlife Management, 73(7). https://doi.org/10.2193/2008-468

Hillhouse, H. L., Zilli, L., & Anderson, B. E. (2018). Timing and Protocols for Estimating Seed Production in Moist-Soil and Phalaris arundinacea Dominated Areas in Rainwater Basin Wetlands. Wetlands, 38(3). https://doi.org/10.1007/s13157-017-0991-4

LAUBHAN, M. K., & FREDRICKSON, L. H. (1992). ESTIMATING SEED PRODUCTION OF COMMON PLANTS IN SEASONALLY FLOODED WETLANDS. JOURNAL OF WILDLIFE MANAGEMENT, 56(2), 329-337. https://doi.org/10.2307/3808831

HAUKOS, D. A., & SMITH, L. M. (1993). MOIST-SOIL MANAGEMENT OF PLAYA LAKES FOR MIGRATING AND WINTERING DUCKS. WILDLIFE SOCIETY BULLETIN, 21(3), 288-298.

Sherfy, M. H., & Kirkpatrick, R. L. (1999). Additional regression equations for predicting seed yield of moist-soil plants. WETLANDS, 19(3), 709-714. https://doi.org/10.1007/BF03161707

Gray, M. J., Kaminski, R. M., & Brasher, M. G. (1999). A new method to predict seed yield of moist-soil plants. JOURNAL OF WILDLIFE MANAGEMENT, 63(4), 1269-1272. https://doi.org/10.2307/3802844

Gray, M. J., Kaminski, R. M., & Weerakkody, G. (1999). Predicting seed yield of moist-soil plants. JOURNAL OF WILDLIFE MANAGEMENT, 63(4), 1261-1268. https://doi.org/10.2307/3802843

Osborn, J. M., Hagy, H. M., McClanahan, M. D., & Gray, M. J. (2017). Temporally Robust Models for Predicting Seed Yield of Moist-Soil Plants. WILDLIFE SOCIETY BULLETIN, 41(1), 157-161. https://doi.org/10.1002/wsb.735

Collins, D. P., Conway, W. C., Mason, C. D., & Gunnels, J. W. (2017). Seed Yield Prediction Models of Four Common Moist-Soil Plant Seed Yield Prediction Models of Four Common Moist-Soil Plant Species in Texas Species in Texas. The Texas Journal of Agriculture and Natural Resources, 30, 78-86. https://scholarworks.sfasu.edu/agriculture_facultypubs/28

Nakagawa, S. and Schielzeth, H. (2013), A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods Ecol Evol, 4: 133-142. https://doi.org/10.1111/j.2041-210x.2012.00261.x

Kamil Bartoń (2020). MuMIn: Multi-Model Inference. R package version 1.43.17. https://CRAN.R-project.org/package=MuMIn

Title of SOP 5 should reference article titles that have been published about d2m models.